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ABSTRACT Acinetobacter baumannii is a globally important nosocomial pathogen characterized by an increasing incidence of 
multidrug resistance. Routes of dissemination and gene flow among health care facilities are poorly resolved and are important 
for understanding the epidemiology of A. baumannii, minimizing disease transmission, and improving patient outcomes. We 
used whole-genome sequencing to assess diversity and genome dynamics in 49 isolates from one United States hospital system 
during one year from 2007 to 2008. Core single-nucleotide- variant-based phylogenetic analysis revealed multiple founder strains 
and multiple independent strains recovered from the same patient yet was insufficient to fully resolve strain relationships, where 
gene content and insertion sequence patterns added additional discriminatory power. Gene content comparisons illustrated ex- 
tensive and redundant antibiotic resistance gene carriage and direct evidence of gene transfer, recombination, gene loss, and 
mutation. Evidence of barriers to gene flow among hospital components was not found, suggesting complex mixing of strains 
and a large reservoir of A. baumannii strains capable of colonizing patients. 

IMPORTANCE Genome sequencing was used to characterize multidrug-resistant Acinetobacter baumannii strains from one 
United States hospital system during a 1-year period to better understand how A. baumannii strains that cause infection are re- 
lated to one another. Extensive variation in gene content was found, even among strains that were very closely related phyloge- 
netically and epidemiologically. Several mechanisms contributed to this diversity, including transfer of mobile genetic elements, 
mobilization of insertion sequences, insertion sequence-mediated deletions, and genome-wide homologous recombination. 
Variation in gene content, however, lacked clear spatial or temporal patterns, suggesting a diverse pool of circulating strains 
with considerable interaction between strains and hospital locations. Widespread genetic variation among strains from the same 
hospital and even the same patient, particularly involving antibiotic resistance genes, reinforces the need for molecular diagnos- 
tic testing and genomic analysis to determine resistance profiles, rather than a reliance primarily on strain typing and antimicro- 
bial resistance phenotypes for epidemiological studies. 
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Nosocomial infections are a significant public health concern 
and economic cost to health care systems (1,2). Therefore, a 
critical need exists for a better understanding of nosocomial 
pathogen population dynamics and epidemiology to improve di- 
agnostics and infection control efforts. Comparative whole- 
genome sequencing (WGS) offers opportunities to address these 
issues, with analyses expected to also lead to increased under- 
standing of pathogen transmission routes and the movement of 
targeted genetic elements such as antimicrobial resistance or 
virulence-associated genes (3-7). 

One nosocomial pathogen to emerge in recent years is Acineto- 
bacter baumannii, which is of particular concern in light of the 
global occurrence of multidrug-resistant (MDR) and pan-drug- 
resistant strains (8-11). A. baumannii was an uncommon patho- 



gen in health care settings until recent decades, but is now a lead- 
ing cause of ventilator-associated pneumonia and surgical and 
urinary tract infections, among other illnesses ( 1 ) . This increase in 
the prevalence of A. baumannii in health care-associated infec- 
tions has occurred in conjunction with an increase in the preva- 
lence of MDR strains, with 60% of the isolates in the United States 
reported as MDR according to recent surveillance data (1). 

Drug resistance is a major factor contributing to the suc- 
cess of A. baumannii in hospital settings (12-15). Genetic charac- 
terizations of A. baumannii strains have revealed that they possess 
a diverse and extensive arsenal of chromosomal and plasmid- 
borne resistance genes (16, 17). For example, A. baumannii pos- 
sesses the intrinsic chromosomal j3-lactamase genes bla ADC and 
W fl oxA-5i-iike> an d overexpression of these genes driven by pro- 
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moters in upstream insertion sequence (IS) elements can confer 
resistance to extended-spectrum cephalosporins and carbapen- 
ems, respectively (18-21). Several allelic variants of each gene 
from clinical A. baumannii strains have been described (22-24), 
but less is known about the dynamics of alleles within A. bauman- 
nii populations. 

Another hallmark of A. baumannii antibiotic resistance mech- 
anisms is the prevalence of resistance islands (RIs) where 
transposon- and integron-derived modules have played a role in 
mobilizing genes conferring resistance to several classes of drugs, 
including the frontline carbapenem and aminoglycoside antibiot- 
ics. Many recent studies highlight the diversity in the genomic 
location, architecture, and content of these RIs, demonstrating the 
dynamic nature of A. baumannii antibiotic resistance mechanisms 
and the adaptive significance of these elements (25-27). Plasmid- 
borne resistance genes are also reported in A. baumannii, where 
the association of resistance genes with IS and site-specific recom- 
bination systems facilitates their dispersal (28-30). Though the 
analysis of the origin and movement of antibiotic resistance mech- 
anisms remains a research priority, the diversity and distribution 
of resistance genes circulating among the A. baumannii strains 
within one hospital and their genetic context remain poorly re- 
solved. 

Despite increased research efforts into A. baumannii epidemi- 
ology and evolution (15, 31, 32), large gaps remain in our under- 
standing of the evolutionary processes that contribute to strain 
diversification within hospital environments. Processes at this 
scale are important for understanding transmission routes and 
whether infections are primarily a result of patient-to-patient 
contact (including via mediators such as health care workers or 
environmental surfaces) or the result of multiple founder events. 
Previous examinations of A. baumannii epidemiology and diver- 
sity have focused on the examination of strain dynamics and gene 
content across time and space by using coarser measures of relat- 
edness such as multilocus sequence typing (MLST), pulsed-field 
gel electrophoresis fingerprinting, or multilocus variable-number 
tandem-repeat analysis profiles (15, 33-38) and have demon- 
strated the successive clonal spread of three primary lineages, 
global clones I to III, in hospitals worldwide (34). While several 
sequence types (STs) of A. baumannii strains can coexist within 
hospitals (39, 40), the limited resolution of these typing schemes 
masks the extent of lateral gene transfer (LGT) and recombina- 
tion, which are critical for driving strain differentiation. A more 
recent whole-genome comparative analysis of within-hospital 
strain dynamics suggested that multiple founder A. baumannii 
strains can be present simultaneously and noted the role of 
genome-wide recombination in within-hospital strain diversifica- 
tion (32). 

To further expand what is known about A. baumannii evolu- 
tionary processes and population dynamics, we used comparative 
analysis of genome sequences from 49 A. baumannii strains to 
examine strain level genetic diversity and gene content variation 
over a 1-year period within one United States hospital system. By 
studying many strains from an interconnected health care envi- 
ronment, we aimed to gain a better understanding of routes of 
transmission between patients and within and among hospitals, 
the diversity of the population of founder strains, and the extent to 
which LGT, recombination, and mutation contribute to A. bau- 
mannii evolution. 



RESULTS 

We sequenced the genomes of 49 A. baumannii strains that were 
isolated from patients in one tertiary-care hospital, three regional 
hospitals, and an extended-care facility that are part of a single 
integrated hospital system. Earlier work showed that the majority 
of the isolates are from a single MLST group, implying that the 
outbreak was primarily clonal, with patient-to-patient transmis- 
sion explaining most of the new infections (37) . Genome sequenc- 
ing confirmed that most of the strains belonged to global clone 2 
(GC2) (ST2 in the nomenclature of Diancourt et al. [34]), with 
additional strains making up either new (three strains) or different 
MLST groups (seven strains were ST79) (see Table SI in the sup- 
plemental material). To facilitate detailed analysis of strain rela- 
tionships and correlation with clinical and phenotypic features, 
we first developed a robust phylogeny based on single-nucleotide 
variants (SNVs) that are present in core regions of the genome to 
represent ancestral relationships among strains. Because recom- 
bination in A. baumannii was previously reported (32), we also 
excluded regions of elevated SNV density from the analysis, where 
SNV patterns reflect recent recombination and not shared ances- 
try. Removal of these -30,000 SNVs altered the topology of the 
phylogenetic tree by changing both the placement of a major lin- 
eage (i.e., clade D was initially on the same branch with ACICU) 
and the placement of interior branches (i.e., UH5307 and 
UH19908 in the original tree were in clade B). 

The core SNV tree revealed five well-supported primary sub- 
clades, with four of them ( A to D) representing highly similar GC2 
isolates (Fig. 1A and B). Each clade was composed of strains from 
more than one hospital (Fig. 1C) and spanning the study period 
(see Table SI in the supplemental material). The clades were in- 
terspersed with A. baumannii strains reported by others that were 
isolated from disparate geographic locations worldwide. 

Pangenome: core and accessory genes. To explore additional 
genetic features that might shed light on strain relatedness, we 
compared the gene contents of the 49 University Hospitals (UH) 
and 20 reference A. baumannii strains by using the pangenome 
PanOCT analysis software. We determined that 2,651 open read- 
ing frames (ORFs) were common to all 69 strains and 2,906 ORFs 
were common to all 55 GC2 isolates. An additional 2,427 ORFs 
were present in subsets of at least two GC2 strains, illustrating the 
extent of gene content variability within this group. A gene con- 
tent tree based on a pairwise distance matrix of shared gene con- 
tent has a topology much different from that of the core SNV tree, 
yet the well-supported SNV-based clades are preserved (Fig. ID). 
The difference in tree topology is driven primarily by gene gain of 
laterally acquired plasmid- and phage-associated genes and by 
IS-mediated gene loss, as discussed below. Significant variation in 
gene content was found, with only a few strains having identical 
genetic complements. In fact, in the 39 UH GC2 genomes ana- 
lyzed here, there are 24 distinct gene sets. In the following sections, 
we describe the nature of these gene gain and loss events and their 
association with antibiotic resistance genes and genes that may be 
associated with other traits of clinical interest. 

Chromosomal gene gains: RIs. Variations at three RI locations 
and structures were found in the UH strains (Fig. 2A). Two of the 
RIs were similar to previously identified GC2 RIs, with the third 
representing a previously undescribed RI location in A. bauman- 
nii. RI distribution among clades was generally consistent with the 
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FIG 1 Phylogenetic trees. (A) Core SNV maximum-likelihood phylogenetic tree constructed from a whole-genome alignment in Mauve and filtered to remove 
recombinant regions, resulting in 129,899 SNVs used for tree building. (B) Core SNV phylogenetic tree showing relationships among only the ST2 (or GC2) 
genomes. (C) Core SNV phylogenetic tree topology showing the relationships among all of the strains. Confidence values from 100 bootstrap iterations are shown 
at the nodes. Clade color boxes represent well-supported primary lineages containing UH strains. Strain name boxes are color coded by the hospital of origin, 
with reference genomes left uncolored, where green indicates a tertiary hospital, blue indicates an ECC, yellow indicates community hospital A, pink indicates 
community hospital B, and gray indicates community hospital C. (D) Gene content tree. Information about the percentage of shared genes from PanOCT 
pangenome clustering for each pair of strains was converted into a distance matrix and used to construct a neighbor-joining tree. Vertical color bars between 
panels C and D highlight the placement of primary clades in core SNV and gene content trees. 



phylogenetic tree, but with a few important differences. Long 
reads from Pacific Biosciences RS sequencer (PacBio) single- 
molecule real-time (SMRT) sequencing aided in defining the ge- 
netic structure and chromosomal position of RIs in UH9907 and 
UH10707. 

AbaR4-type RIs (also called AbGRl [26]) were inserted into 
the comM gene of all GC2 UH strains, except for the UH7007 clade 
which had no RI at this position, and are primarily absent from 
non-GC2 strains, except for reference strains ATCC 17978, AYE, 



and AB0057. Non-GC2 UH strain UH5207 contained genes asso- 
ciated with metal resistance in this RI. The primary difference 
within the GC2 strains centered on whether a second copy of a 
Tn602i-like transposon was present in the RI. This version of the 
RI, as determined from the UH9907 PacBio assembly, was identi- 
cal to RI T yth-i of TYTH-1, which was isolated in Taiwan in 2008 
(41). The other variant, as determined from the UH6107 and 
UH10707 assemblies, was identical to AbaR4a (42), but the 
UH10707 version had an additional ISAbal copy downstream 
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FIG 2 Genome features. The phylogenetic tree from Fig. 1C is shown at the top with the well-supported clades labeled. (A) Rl gene content. Bar colors represent 
different variants; blank means that no Rl is present. Horizontal bars in the comM Rl row indicate the presence of ISAbal at the same location. Black circles here 
indicate that an Rl is present but is not similar to the UH strain RIs in content or organization. In the astA Rl row, black circles indicate that an IS26-mediated 
deletion was present while blue circles indicate that an AbGRI2-like Rl is present here. In the ACICU_02399 row, bar colors show whether Tn 1 548 is present at 
the ACICU_02399 location (pink) or at an undetermined location (gray) (see text for further details of variations). (B) Mobile genetic element distribution, 
where blank indicates that an element is not present. pABUHl (gray) carries b!a OXA _ 23 and aphA6, while pACICU2 (orange) does not. pABUH2 and pABUH3 
color differences reflect the presence of b!a OXA _ 40 on different plasmid backbones. pABUH4 and pABUH5 are large plasmids (-110 kb) carrying phage-related 
sequences. pABUH6 color differences reflect two variations, where pABUH6a (pink) is essentially identical to pAB0057 (CP001 183) and pABUH6b (orange) has 
an additional 3 kbp, including a putative mob ORF (Fig. 1). Phage content refers to the phage-related region located at ACICU 1.11 Mbp, where blue indicates 
that both phages are present, pink indicates that the first phage is present, orange indicates that the second phage is present, and green indicates that a different 
phage is present (see the text for details). (C) Variation in the presence of a T6SS gene cluster and the csuE gene cluster. Shading, present; blank, absent. (D) 
Surface polysaccharide variants for the OC and capsular polysaccharide (K) loci. Blank, the structure is unique to that strain; horizontal bar color represents 
ISAbal insertion locations within each locus. (E) Intrinsic chromosomal )3-lactamase alleles. Blank indicates unique alleles, blue indicates that bla Aric is ADC30 
variant, and horizontal bars indicates the presence of an upstream ISAbal for Wa OXA _ 51 _ like . (F) Variant region showing recombination around the heme 
utilization region. Orange indicates that the heme region is present, and gray indicates that the region is absent. 



from the sul ORF. The distribution of the two variants among UH 
GC2 strains did not strictly conform to the phylogeny. 

An AbGRI2-type island (sensu reference 26) was located adja- 
cent to the arginine N-succinyltransferase gene (astA) (1.318-Mb 
A1S_1093 in ATCC 17978) and associated with an ~40-kb chro- 
mosomal deletion (A1S_1093 to A1S_1126 in ATCC 17978), as 
found in other GC2 strains (26). The only non-GC2 strain to have 
a deletion in this region was UH7607. Within the GC2 strains, 
there were two variants whose distributions were primarily con- 
sistent with the phylogeny. The structure and content of one vari- 



ant (clades A and C) were similar to those of the AbaR-like region 
of AbGRI2-l from Australian reference strain WM99c containing 
bla TEM , which confers resistance to ampicillin, and aphAl, which 
confers resistance to kanamycin and neomycin, except that it 
lacked the class 1 integron portion of the element (26). Genomes 
in clade B had a larger deletion at this location (65 kb relative to 
1.264 Mb [A1S_1083] to 1.329 Mb [A1S_1138] in ATCC 17978), 
with an IS26 copy at the deletion site but no apparent antibiotic 
resistance genes (e.g., UH8907). The second variant (clade D) did 
not have the 40-kb chromosomal deletion but had a smaller dele- 
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tion in the opposite direction (1.3 18 Mb [A1S_1127] to 1.324Mb 
[A1S_1134] in ATCC 17978) and contained aphAl and aac3IIa 
(resistance to gentamicin) (e.g., UH10707). 

The third and previously undescribed RI was located in a re- 
gion corresponding to -2.53 Mb in ACICU between genes encod- 
ing a major facilitator superfamily transporter and a putative 
GNAT family acetyltransferase (ACICU_02398 and AC- 
ICU_02399), where the content and organization are similar to 
those of Tni54<S (43). This RI has been reported on plasmid pZJ06 
(CP001938) (44). This chromosomal location was confirmed by 
using PCR to verify RI flanking regions (data not shown). Several 
resistance genes are located within this element that confer resis- 
tance to aminoglycosides through target modification (armA) and 
enzymatic modification [aadAl, aac(6')-Ib, aphAl] and to chlor- 
amphenicol {catB8). The distribution of the Tni548-like RI is 
largely consistent with the phylogeny and present in GC2 refer- 
ence strain TYTH-1. Tni54S is also present in the draft genomes 
of MDR-TJ and AB210, but its location could not be confirmed. 
The exception to this is UH10007, which does not carry the 
Tni54S-like element but has a deletion at the same location. How- 
ever, the RI is present in closely related isolate UH10107, which 
was recovered from the same patient as UH10007. UH8407 and 
UH8707 have an IS26 element at this location and some Tni548 
genes, but the draft assembly state precludes precise definition of 
the structure. UH20108 and UH18608 have an IS26 copy at this 
location but no apparent RI. 

Mobile gene content differences. Plasmid distribution among 
the UH and reference strains reveals a dynamic exchange occur- 
ring across hospital locations and within patients (Fig. 2B). One 
large pACICU2-like plasmid ( designated pABUHl) present in the 
UH strains carries bla OXA _ 23 and aphA6, both associated with IS 
elements (Fig. 2B; see Fig. SI in the supplemental material). A 
fragment of ISAbal is located upstream of aphA6, whereas 
bla OXA _ 23 is flanked by !SAbal25 copies in the same orientation 
and inserted at the location where pACICU2 has a single !SAbal25 
copy. Other non-UH strains carry bla OXA _ 23 in RIs, except A. bau- 
mannii UMB001, where it is also on a pACICU2-like plasmid. The 
plasmid occurs in two GC2 clades, A and D, and is variably present 
in a third, B. The dynamic nature of plasmid transfer is illustrated 
in three strains derived from a single patient over a 14-day period. 
Strains UH3807 (day 1) and UH6207 (day 14) have essentially 
identical genome sequences (clade B), except that UH6207 carries 
pABUHl. UH6107, isolated on day 7, has a clade A sequence. Our 
analysis suggests that pABUHl was transferred from UH6107 to 
the UH3807-UH6207 background in the context of a mixed in- 
fection in this patient. The other clade B genomes carrying 
pABUHl were isolated after UH6207. 

Antibiotic resistance plasmids harboring bla OXA _ 24/40 were re- 
stricted to clade E but found across hospital locations. Strains in 
clade E each carried two novel plasmids (pABUH2 and pABUH3) 
that have different rep and mob genes (Fig. 2B; see Fig. SI in the 
supplemental material), where the rep gene in pABUH2 was sim- 
ilar to GR12 in the replicon typing scheme of Bertini et al. (45), but 
the rep gene in pABUH3 was novel and most similar to rep genes 
from plasmids in A. Iwoffii. In one subclade, bla OXA _ 24/40 occurred 
on pABUH2 (e.g., UH 19608), and in the other subclade, it oc- 
curred on pABUH3 (e.g., UH7607). The bla OXA _ 24/4Q gene is 
flanked by XerC/XerD recombination sites on both plasmids, 
consistent with previous work demonstrating that this site- 
specific recombination system is mobilizing bla OXA _ 24/40 in A. bau- 



mannii (28, 29). The same alternative gene encoding a putative 
acetyltransferase is located at the corresponding location on the 
other non-Wa OXA _ 40 plasmid, also flanked by XerC/XerD sites. 
The XerC/XerD inverted repeat and intervening sequences from 
the two plasmids are nearly identical (see Fig. SI in the supple- 
mental material), with two nucleotide differences in the upstream 
XerC/XerD site that suggest a recombination event between the 
two plasmids at the upstream XerD location. 

Two additional large plasmids, pABUH4 and pABUH5, each 
-110 kbp and containing primarily phage-related sequences, were 
identified in the UH strains. pABUH-4 is similar to pABTI2 from 
Chinese isolate MDR-TI (contig accession number CP004359). 
The pABUH-5 plasmid in the assembly of UH10707 obtained by 
the hierarchical genome assembly process (HGAP) also carries the 
element lS26-aphAl-lS26-mph2-msrE-lS26-lS26, which encodes 
kanamycin and neomycin (aphAl) and macrolide (mph2 and 
msrE) resistance. Some strains carry both of these plasmids (clade 
B) , and the two plasmids may be joined across two phage integrase 
junctions (V426_1793 and V426_1913) in these strains, as indi- 
cated by the UH9907 HGAP PacBio assembly (AYOH00000000). 
The distribution of these plasmids is primarily consistent with the 
phylogeny, and they are present in reference genomes. Additional 
plasmids in the UH strains include two pAB0057-like (CP001 183) 
plasmids of 8.7 and 10 kb showing highly variable distribution in 
the UH and reference strains (pABUH6, Fig. 2B). 

Phage-related regions are another source of variability contrib- 
uting to differentiation among A. baumannii strains. In GC2 ge- 
nomes, three primary variants of a phage region are all located 
within the same chromosomal location corresponding to 1.11 to 
1.16 Mbp in ACICU (ACICU_00997 to ACICU_01077) or 1.45 to 
1.53 Mbp in TYTH-1 (M3Q_1334 to M3Q_1458), where variant 
distribution within the GC2 strains lacks a discernible geographic 
or phylogenetic pattern (Fig. 2B) . One primary variant typified by 
the finished reference genome of TYTH- 1 consists of two likely 
phage insertion events flanked by phage integrases. The other 
variant typified by ACICU has only one of the two phage elements 
(-45 kb). A third variant (e.g., UH0207-like) possesses only the 
second of the two phage elements (-36 kb). Strains in non-GC2 
clade E vary in whether a phage region is present here. Interest- 
ingly, the four strains without a phage insertion at this location 
(i.e., UH6907, UH7607, UH7907, and UH22907) are also the only 
four UH strains to possess clustered regularly interspaced short 
palindromic repeat systems. This location is not the only phage- 
associated region in the genomes examined, as the UH9907 and 
UH10707 HGAP assemblies suggest that each has at least two 
other chromosomal prophage regions. 

Chromosomal gene losses: IS-mediated deletions. There are 
multiple instances of chromosomal gene loss that are likely medi- 
ated by an IS adjacent to the deletion (Fig. 2C). One clade has two 
large deletions adjacent to ISAbal elements that have not been 
previously described, including a 40-kb region (1.377 to 
1.426 Mbp in ACICU, corresponding to ORFs ACICU_01272 to 
ACICU_01320) encoding the entire type VI secretion system 
(T6SS) (Fig. 2C). A second ISAbal -associated deletion of -20 kbp 
(2.538 to 2.558 Mbp in ACICU, corresponding to ORFs AC- 
ICU_02401 to ACICU_02418) occurred in a region of adhesion 
genes (csuE) and aspartate metabolism near the insertion of the 
Tni54S-like RI at ACICU_02399. The csuE region is also absent 
from reference strain MDR-ZJ06. 
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Surface polysaccharides. The UH strains show substantial 
variation in the content and organization of loci involved in sur- 
face polysaccharide synthesis (Fig. 2D), yet there is no evidence of 
recombination occurring among the UH strains at either region 
involved in surface polysaccharide production as has been re- 
ported in other strain collections (32). Core lipooligosaccharide 
(LOS) loci (outer core [OC] locus sensu reference 46) consist of 
one type, OCL1, that is present in most strains, but UH strains 
have experienced a number of independent ISAbal insertions 
across the clades (horizontal bars in Fig. 2D). GC2 strains in clades 
A, B, and C have identical capsular (K locus) loci most closely 
related to ACICU (99.5% identity at the nucleotide level, KL2 in 
reference 46), except for two strains isolated from the same patient 
(UH9007 and UH9707) that have a unique ISAbal insertion at the 
same position (Fig. 2D). The K locus of clade D strains is most 
similar to that of MDR-TJ (98.6% nucleotide identity) or KL9, 
while all other clades have unique K loci. 

Intrinsic chromosomal resistance genes. The phylogenetic 
distribution of different alleles of Wa 0 xA-5i-iike ana bla ADC show 
evidence of recombination and mutation. Each primary UH clade 
has a different bla ADC allele, yet the gene is always associated with 
upstream ISAbal oriented to allow overexpression of the gene 
(except for UH51007, UH5207, and UH6507, which do not have 
an upstream ISAbal copy, Fig. 2E) and present at the same chro- 
mosomal location. Sequences from strains A to D are either iden- 
tical to the extended-spectrum ADC30 Waadc variant (47) (clade 
C) or differ in the ft loop by either the substitution or the addition 
of another amino acid (see Fig. S2 in the supplemental material). 
However, non-GC2 clade E has a nucleotide insertion at bp 102 of 
the coding regions, likely resulting in a truncated ADC protein. 
Unlike the bla^Q allelic distribution, the allelic distribution of the 
W a oxA-5i-m«: gene, an d whether ISAbal is present upstream, is best 
explained by a recombination event. The coding variant in clades 
B and D (bla OXA _ g2 ), as in reference 22, is linked to the presence of 
an upstream insertion element, ISAbal, suggesting that the entire 
region was replaced by homologous recombination in one or both 
lineages. The most common bla OXA _ 5l _ lik€ variant present in the 
other GC2 strains is Wa oxa _ 6 6; with other variants differing by one 
allele or by more than one in the more distant clades (see Fig. S3 in 
the supplemental material). 

Recombinant heme utilization region. Recombination is 
contributing to strain diversification at multiple scales in the UH 
strains, including the bla OXA _ 5l _ hkc region and plasmids described 
above. Additionally, a large region of 50 kb was identified as a zone 
of elevated SNV density between 0.965 and 1.015 Mbp in the AC- 
ICU genome coordinates. A 10-kb section of this region was pre- 
viously identified as a recombination hot spot by Snitkin et al. 
(32), but the inclusion of a broader range of strains reveals that it 
extends much farther (ORFs corresponding to ACICU_00866 to 
ACICU_00912). There are two major variant types that differ in 
the presence of a 12-kb region containing several genes involved in 
heme utilization (ACICU_00871 to ACICU_00882, Fig. 2F). 
Other large regions where recombination is apparent are around 
the origin of replication in ACICU and the K locus, as described by 
Snitkin et al. (32). 

Coinfection dynamics. There were six sets of isolates obtained 
from the same patients (a total of 13 strains; see Table SI in the 
supplemental material). The UH8807-UH8907, UH8407- 
UH8707, and UH8107-UH9907 pairs were indistinguishable on 
the basis of SNV patterns and gene contents. One triplet of strains 



from the same patients was composed of two nearly identical 
strains of the same clade (UH3807 and UH6207) and a third of a 
different clade (UH6107) that enabled detection of the apparent 
plasmid transfer event described above. UH9007 and UH9707, 
isolated 15 days apart, are indistinguishable by core SNV analysis 
and have the same unique ISAbal insertion in the lipopolysaccha- 
ride (LPS) region that no other strains examined possess. Notably, 
they do differ in that UH9707 carries pABUH6 while UH9007 
does not. UH10007 and UH10107, isolated on successive days, are 
also indistinguishable by core SNV analysis but differ markedly in 
gene content. For example, UH 10007 carries pABUH5 and 
aphA6, which UH10107 lacks, but lacks the Tni548-like RI that is 
present in UH 1 0 1 07. Furthermore, UH 1 0007 and UH 1 0 1 07 differ 
at the comM RI in that UH 10007 has an ISAba copy inserted at the 
same location as the strains in clade D. 

ISAbal insertion locations. The distribution of ISAbal inser- 
tion sites among strains reinforced the major strain groups de- 
fined by SNV typing (Fig. 3). The pattern of shared sites was 
largely consistent with the phylogeny and in fact provides addi- 
tional phylogenetic resolution in several cases. The relationships 
among several strains in clade A are poorly resolved by SNV anal- 
ysis, but ISAbal insertion locations indicate that UH5707 and 
UH7807 should have the same terminal node, for example. Strains 
in clade D had ISAbal locations very distinct from those of the 
other GC2 strains examined, where the only insertion sites that are 
also present in other clades are adjacent to the bla ADC and 
W«oxA-5i-iike g enes - 

DISCUSSION 

A critical aspect of preventing nosocomial infections is under- 
standing the transmission patterns of pathogens. Transmission 
patterns are particularly important for infections caused by MDR 
organisms that are difficult to treat. WGS analysis is now used to 
characterize nosocomial outbreaks (3, 4, 6) and obtain unprece- 
dented insights. For example, Didelot et al. (3) showed that many 
Clostridium difficile infections could not be explained by transmis- 
sion between symptomatic cases, which suggests that alternative 
transmission routes and the role of asymptomatic carriers need to 
be explored further. In contrast, smaller studies of Klebsiella pneu- 
moniae and A. baumannii isolates have shown clear evidence of 
patient- to-patient transmission (7, 31). 

Our approach was designed to evaluate the diversity of strains 
present during a short time period in a tertiary-care hospital along 
with its affiliated community hospitals and an extended-care cen- 
ter (ECC). We found extensive microdiversity among these 
strains, such that most of the strains examined represented dis- 
tinct variants. Typing by conventional methods such as MLST 
suggests that the primary ST representing 39 of 49 strains arose 
from a single founder and spread clonally. The robust phylogeny 
constructed from core SNVs and gene content comparisons, how- 
ever, revealed a much more complicated relationship among 
A. baumannii strains within the hospital system. The strains are 
composed of highly similar yet distinct lineages within ST2 (or 
GC2 ) . The presence of reference strains from disparate geographic 
locations worldwide interspersed among UH strains in the core 
SNV phylogeny supports the idea that these branches diverged 
prior to entering the UH hospital system and long enough ago that 
the descendants are globally dispersed. The limited geographic 
clustering of strains within the UH hospital system suggests the 
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existence of either considerably more independent founders or 
rapid mixing of strains among hospitals. 

There is also evidence of clonal transmission within the hospi- 
tal system, as observed in clade D, where five out of six strains 
originated from the same hospital in the same time frame and 
have indistinguishable gene contents. Compared to the other 
clade D strains, UH 10707 is more divergent in gene content in 
lacking the pABUHl plasmid carrying bla OXA _ 23 and aphA6, car- 
ries a different W«oxA-5i-iike allele, and was isolated later and in a 
different hospital. With the available information, it is not possi- 
ble to say whether UH10707 lost the plasmid from a recent ances- 
tor it has in common with other clade D strains or whether the 
plasmid was acquired after UH 10707 split from the other clade D 



strains. This example highlights the com- 
plexity in determining clonality and di- 
vergence underlying the phylogenetic 
patterns and transmission routes ob- 
served in these strains. 

Antibiotic resistance mechanisms. 
The A. baumannii strains investigated 
here were all MDR strains, except for the 
three with novel STs (i.e., UH5107, 
UH5207, and UH6507), and genomic 
analyses identified an extensive and re- 
dundant repertoire of resistance genes 
(Fig. 4). The allelic distributions of two 
intrinsic chromosomal /3-lactamase resis- 
tance genes were generally consistent 
within each primary clade but variable 
even across the closely related strains 
within the ST2 group. The bla^Q allelic 
distribution is suggestive of at least some 
mutation events leading to variation, but 
elevated SNV density (10 SNVs/kb) in 
clade D and other GC2 clades for 15 kb 
surrounding bla^Q reveals that recombi- 
nation is also present (data not shown). 
Hamidian and Hall (48) have argued that 
ISAbal was independently acquired up- 
stream of bla ADC variants, but it is also 
possible that the insertion of IS elements 
predated subsequent variation through 
mutation and recombination. Strains in 
clade E, which have a truncated and most 
likely nonfunctional ADC because of a 
frameshift, are generally susceptible 
to the cephalosporins (ceftazidime and 
cefepime) tested, and no other cepha- 
losporinase genes are present. Other UH 
GC2 strains possess the extended- 
spectrum ADC variant ADC30, which 
confers broader substrate activity, which 
agrees with the observed cephalosporin 
resistance phenotypes (47). The allelic 
distribution of the WfloxA-si-iike /3- 
lactamase gene, and whether ISAbal is 
present upstream, is best explained by a 
recombination event, as the coding vari- 
ant in clades B and D is linked to the pres- 
ence of an upstream ISAbal element, sug- 
gesting that the entire region was replaced by homologous 
recombination in one or both of the lineages. The UH strains 
carrying the WfloxA-51-iike variant associated with an upstream 
ISAbal element (i.e., those in clades B and D) are all resistant to 
the carbapenems tested, consistent with the effect of a promoter 
provided by the IS. 

RIs are a major source of variability contributing to microdi- 
versity within the strains examined. To our knowledge, this is the 
first report of an RI at the ACICU_02399 location. TnJ54S has 
been reported in a number of members of the family Enterobacte- 
riaceae, and similar sequences have been observed in draft A. bau- 
mannii genomes and plasmid pZJ06. The chromosomal location 
of the closely related sequences in ACICU_02399 reinforces the 
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role that antimicrobial selection continues to play in driving strain 
differentiation. This is further evidenced by the presence of redun- 
dant genetic resistance mechanisms in RIs and elsewhere in the 
UH strains, suggesting continued pressure on the pathogen to 
enhance its ability to escape antimicrobial therapy. For example, 
several strains in clades B and D have both an acquired carba- 
penemase gene (bla OXA _ 23 ) and ISAba 1 -driving bla OXA _ 51 _ like . 
Aminoglycoside resistance mechanisms are particularly redun- 




dant. Tnl548 contains the armA gene 
for an rRNA methyltransferase (target 
modification) that confers resistance to 
amikacin, gentamicin, and kanamycin 
(49) and genes that encode drug-specific 
aminoglycoside-modifying enzymes, 
aac(6)-Ib (gentamicin) and aphAl (kana- 
mycin). There are also examples of geno- 
typic redundancy, as is the case for those 
strains (e.g., clade A UH8107, UH9707, 
UH9907, UH15208, and UH16008) car- 
rying copies of aphAl in both the comM 
and astA RIs. 

Mobile elements show pronounced 
variability among strains such that the 
collective distribution of the plasmids and 
phage results in unique gene sets for 
many highly similar strains (Fig. 2B). This 
includes pABUHl, the pACICU-2-like 
plasmid that harbors bla OXA _ 23 and 
aphA6. The pACICU2-like backbone is 
distributed worldwide, but among ref- 
erence strains, only UMB001 has the 
pABUHl-like variant that carries the re- 
sistance genes (Fig. 2B). The association 
of pACICU2-like plasmids and bla OXA _ 23 , 
but not aphA6, has been noted before by 
Bertini et al. (45), who also demonstrated 
the ability of the plasmid to be transferred 
by conjugation. The two large (~110-kb) 
plasmids carrying phage-derived se- 
quences, pABUH4 and pABUH5, have 
not been previously reported, although 
they are present in multiple reference 
strains. It is difficult to predict whether 
they confer any fitness advantage because 
of the overwhelming number of ORFs an- 
notated as hypothetical and conserved 
hypothetical proteins, but the UH10707 
assembly obtained by the HGAP does 
place a copy of aphAl and two macrolide 
resistance genes in an element containing 
four copies of IS26 on pABUH5. The de- 
tection of highly similar plasmids in ref- 
erence strains from Australia and China 
indicates that this plasmid is circulating 
globally. 

Recombination. Recombination has 
been reported to contribute to A. bau- 
mannii genome change (32), but the ex- 
tent of recombination has not been stud- 
ied in depth. Most of the genome regions 
in the strains sequenced here exhibited low levels of SNV density 
(<0.1 SNV/kbp). One exception is a recombinant region centered 
around a 50-kb span in which a putative heme utilization region, 
including a gene coding for a heme oxygenase (hemO) (50, 51), is 
variably present. The 12-kb heme utilization region is also highly 
variable among A. baumannii reference strains, making it difficult 
to assess whether this region was present in a common ancestor 
and lost by some strains or whether it was gained and subsequently 




genes present in 
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transferred via recombination to different lineages as initially ob- 
served by Antunes et al. (50). The region is present in some, but 
not all, non-A. baumannii Acinetobacter isolates, further compli- 
cating inference of the ancestral state and whether this region has 
contributed to the success of A. baumannii as a pathogen. Two 
strains that are grouped together by SNV analysis in clade A, 
UH5307 and UH19908, were isolated 5 months apart from differ- 
ent patients, but both were from the same extended-care facility 
and are an example of a recombination event in this region that is 
likely to have occurred within the UH hospital setting. There is 
also evidence of smaller-scale recombination events in the allelic 
distribution of bla^Q and W«oxA-5i-uke variants. Additionally, re- 
combination events are not restricted to the chromosome, as ob- 
served by comparison of the gene contents of the two plasmids 
carrying bla OXA _ 4Q . However, it is not possible to determine the full 
extent of chromosomal homologous recombination by this ap- 
proach, as exchanges that involve closely related chromosomal 
segments are difficult to detect by using an SNV density metric. 

Surface polysaccharide variation. While there was no evi- 
dence of recombination occurring around the surface polysaccha- 
ride locus among the GC2 clades within the hospital system, there 
were multiple types encountered when the more diverse reference 
strains were examined, consistent with the previous characteriza- 
tion of these regions as a significant source of variability within 
A. baumannii (46). The large number of unique variants of these 
two loci when including non-UH strains for comparison empha- 
sizes the potential selective pressure acting on these regions that 
play a role in interaction with the host immune system. However, 
the presence of reference strains with the same organization as 
that observed in the UH strains indicates that, on some level, these 
regions are stable. The OC locus is less variable, with one predom- 
inant type detected, but interestingly, there are multiple cases of 
independent ISAbal insertion events within the OC region. It is 
not clear whether these are simply random insertions that are 
selectively neutral or whether they affect the cell surface and po- 
tentially the host response. 

One primary difference between GC2 strains and the other 
lineages examined here could have implications for surface poly- 
saccharide structure and host interactions. Previous bioinfor- 
matic and biochemical analyses suggested that A. baumannii pro- 
duces not LPS but LOS, with the distinction being that LOS lacks 
the O-antigen sugar repeat unit ligated to the core oligosaccharide 
because of the lack of an identifiable waaL O-antigen ligase gene 
(46). Our analysis showed that the GC2 strains, except the 
UH7007 clade, have two adjacent ORFs with WaaL conserved 
domains near the pilus locus (at 3.58 Mb in ACICU), where pre- 
vious sequence analyses have detected only one such ORF at that 
location (46). The WaaL domain is associated both with genes 
encoding O-antigen ligases and with pglL genes involved in 
O-linked protein glycosylation (52), but assigning a definite func- 
tion to a gene containing this domain on the basis of sequence data 
alone has previously not been possible. Recently, a hidden Markov 
model (HMM) protein family was developed to discriminate be- 
tween WaaL proteins involved in protein glycosylation and those 
involved in O-antigen linkage (52). Of the two adjacent ORFs in 
the GC2 strains, one (ACICU YP_001848035) is most certainly 
pglL on the basis of this HMM and previous characterization of 
protein glycosylation in A. baumannii (53). However, the second 
ORFwithaWaaL domain (ACICU YP_001848036) doesnothave 
the pgZL-specific domain, as assessed by using the HMM, leading 



us to speculate that it encodes the O-antigen ligase rather than a 
second protein involved in mediating protein glycosylation. Fur- 
ther work is necessary to establish whether this second ORF is 
expressed and whether strains harboring this second ORF are ca- 
pable of producing LPS or, alternatively, glycosylating different 
surface proteins. 

ISAbal distribution. IS elements are important drivers of ge- 
nome change in A. baumannii and other pathogens, as demon- 
strated by the roles they play in mobilizing antibiotic resistance 
genes, modulating gene expression, and mediating gene deletions, 
as observed in this study and others (54). The analysis undertaken 
here represents the first attempt to map the distribution of an IS 
within an A. baumannii genome and compare its locations among 
strains. The chromosomal distribution of ISAbal locations reveals 
not only that it is abundant in UH strains, with strains in clade D 
possessing >20 copies, but also that the whole genome is im- 
pacted. ISAbal may be a recent acquisition by GC2 strains, as it is 
absent from ACICU, and may have contributed to the spread of 
this lineage. The facts that the locations of ISAbal insertion sites 
can be used for phylogenetic analysis and are concordant with the 
core phylogeny suggest that, once inserted, these elements tend to 
stay in the chromosome, as no clear cases of ISAbal loss could be 
inferred from the genomes analyzed here. 

Other gene content variability with potential fitness implica- 
tions. Beyond variable antibiotic resistance gene and mobile ele- 
ment presence, there is substantial variation among strains in gene 
content that may have fitness implications. In addition to the pu- 
tative heme utilization recombinant region and the region deleted 
with the insertion of AbGRI2 in many of the GC2 strains, there are 
multiple examples of IS-mediated gene deletions, including the 
deletion of the entire T6SS operon, and the csu_E/aspartate metab- 
olism regions from strains in clade C. These regions have been 
shown to be involved in interbacterial interactions (55, 56), and 
the initial adherence of cells to abiotic surfaces (57), respectively. 
One hypothesis is that the MDR phenotype conferred by the ex- 
tensive antibiotic resistance gene repertoire renders the T6SS re- 
gions less important, as interbacterial competition is most likely 
reduced over the course of antibiotic therapy. The T6SS region is 
conserved among all of the other A. baumannii isolates examined, 
reinforcing the monophyletic structure of clade D. The persis- 
tence of cells with the csuE region deletion as indicated by the 
recovery of isolates with these deletions over the span of several 
months from multiple source types, including catheters, is sur- 
prising given the prediction of the region's importance for persis- 
tence on hospital surfaces (14, 57, 58). Furthermore, the csuE de- 
letion is hinted at in A. baumannii strains from Latvia (59), and 
these genes are absent from the MDR-ZJ06 assembly, suggesting 
that this loss occurred before strains entered the UH hospital sys- 
tem and that these strains are persisting in hospital environments 
despite this loss. The deletion of the csuE gene also has diagnostic 
significance, as it has been proposed as a molecular marker for 
strain typing efforts (21). 

Coinfection as a facilitator of LGT. For recombination and 
LGT to occur, two strains with differing gene contents must inter- 
act in a way that facilitates DNA exchange. Most of the studies 
done to date have characterized single isolates from a patient or 
assumed single strain infections when developing treatment strat- 
egies for patients. Data from the UH strains suggest that patients 
can be colonized or infected by multiple strains and that they are 
capable of interacting within the patient. This is best exemplified 



January/February 2014 Volume 5 Issue 1 e00963-13 



Bio' mbio.asm.org 9 



Wright et al. 



by the likely transfer of pABUHl among the UH3807-UH6107- 
UH6207 isolates. Other candidates for LGT in the context of coin- 
fection are strains UH10007 and UH10107, which have differ- 
ences in RIs and plasmid contents that cannot be explained by 
gene loss alone despite being isolated from the same patient only 1 
day apart and being nearest neighbors in the SNV phylogeny. We 
hypothesize that an unsampled strain was present as a coinfecting 
strain and transferred genes to UH10007. UH9007 and UH9707 
also varied in plasmid content, with UH9707 carrying pABUH6, 
which is missing from UH9007. In this case, it is difficult to deter- 
mine whether these were initially two different strains that colo- 
nized the patient or, alternatively, whether this is an example of 
plasmid loss from one isolate occurring within the patient. Re- 
peated infection with genetically distinct A. baumannii strains has 
been reported in the context of colistin treatment as well (60). 
Taken together, these findings suggest in vivo genetic exchange 
among different strains occurring within patients and provide a 
mechanism by which A. baumannii obtains new genetic informa- 
tion, including antibiotic resistance genes. Moreover, divergent 
strains with different STs were also encountered in the UH strains 
and represent potential opportunities for new genetic material to 
be introduced into GC2 strains through such a mechanism. 

Conclusion. The presence of closely related strains in Asia and 
Europe and the presence of mobile elements recovered from geo- 
graphically widespread areas highlight the global dissemination of 
these organisms and genes. By investigating dynamics within one 
hospital system, we were able to identify evolutionary mecha- 
nisms that contribute to this process at a local scale. There was 
limited spatial or temporal clustering of strain types and gene 
contents within different hospital components, indicating that an 
endemic and interacting A. baumannii population exists either 
within the UH hospital system or in patients colonized with the 
bacteria. The movement of patients and staff between the affiliated 
hospital locations may contribute to strain mixing and diversifi- 
cation. Previous work has hypothesized that ECCs represent res- 
ervoirs for health care-associated pathogens, including A. bau- 
mannii (61, 62). The observation that the same lineages and gene 
contents observed in the tertiary-care hospital and regional hos- 
pitals are also detected in the ECCs provides support for this hy- 
pothesis. Alternatively, asymptomatic carriers could facilitate a 
standing A. baumannii population circulating in the community. 
While the lack of high-resolution patient data in this study pre- 
cludes a rigorous assessment of specific transmission routes, the 
data indicate that transmission and gene flow occurred among the 
hospitals. 

Summary. Strain level whole-genome analyses of A. bauman- 
nii isolated from one integrated United States hospital system 
demonstrate that nearly every strain was unique despite being 
indistinguishable by conventional sequence typing methods and 
in some cases by core SNV typing. This study improves our un- 
derstanding of the evolutionary processes that contribute to the 
emergence of MDR nosocomial pathogens, including genomic 
mixing during coinfection events. The analyses reported here lead 
to an improved conceptual model of A. baumannii population 
dynamics within hospitals that suggests that endemic strains exist 
and interact with one another, with the additional periodic influx 
of novel strains that potentially bring in new genetic material. 
These findings highlight the importance of identifying and screen- 
ing high-risk patients, such as those who come from extended- 
care facilities or have previous antibiotic exposure, and the impor- 
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tance of developing rapid diagnostic tools to characterize 
antibiotic gene content in individual isolates. In addition to vari- 
ability in antibiotic resistance determinants, other genomic re- 
gions demonstrate dynamic genomic change over short evolu- 
tionary time spans and may point to other aspects of A. baumannii 
physiology that contribute to its success as a nosocomial patho- 
gen. 

MATERIALS AND METHODS 

Strain isolation and genotypic and phenotypic characterization. Strains 
were isolated between 2007 and 2008 from an integrated health care sys- 
tem, the UH of Cleveland in Cleveland, OH. The UH are composed of a 
main tertiary-care facility affiliated regional hospitals and an ECC. The 
hospital system detected an increased prevalence of MDR A. baumannii 
strains beginning in 2007. At that point, isolates were subjected to addi- 
tional analysis to characterize antibiotic resistance phenotypes and STs via 
MLST (37). From this collection of isolates, 49 isolates were selected for 
sequencing to capture a range of hospital locations, dates, genotypes, and 
antibiotic resistance phenotypes (see Table SI in the supplemental mate- 
rial). Additionally, sets of strains isolated from the same patient were 
selected to investigate potential within-host strain interactions. 

To put the sequenced strains (i.e., UH strains) in a phylogenetic con- 
text and assess shared and clade-specific gene contents, genomes were 
compared to reference draft and complete A. baumannii genomes avail- 
able from NCBI as of lanuary 2013 (see Table S2 in the supplemental 
material). 

DNA preparation, library construction, sequencing, and assembly. 

DNA was isolated with the MasterPure Gram-positive DNA purification 
kit (Epicenter Biosciences). Illumina sequencing libraries were prepared 
by using either Nextera or TruSeq kits with indexed-encoded adapters 
from Illumina, according to the manufacturer's instructions. Libraries 
were pooled for sequencing on Illumina GAIIx or HiSeq, and paired-end 
sequence reads were obtained representing 50- to 200-fold genome cov- 
erage. Strains representing two distinct ST2 lineages (UH9907 and 
UH10707) were also subjected to SMRT sequencing on a PacBio. PacBio 
sequencing resulted in long reads (median, -3.5 kbp) and ~10X to 20X 
coverage of error-corrected reads. Illumina sequence data were assembled 
by using Velvet (63). A range of k-mer values was evaluated, and the 
assembly with the largest N50 was selected for annotation and analysis. 
The PacBio sequence was assembled by using the HGAP (64). Several 
hybrid assemblies combining PacBio with Illumina sequence data were 
not sufficiently superior to the assembly obtained by the HGAP (data not 
shown). For information on assembly quality, see Table S3 in the supple- 
mental material. Assemblies of Illumina data generally had contig N50 
values of >100 kbp. The PacBio assemblies had contig N50 values of 
>1 Mbp. 

Genome annotation. Genes were annotated in each genome assembly 
by using an automated annotation system (65). This annotation pipeline 
defines protein- and RNA-coding genes and assigns names (66) and func- 
tional classifications based on TIGRFAMs (67), Pfams (68), and Char- 
ProtDBs (69). 

Core phylogeny construction. SNVs were identified on the basis of 
whole-genome alignment of 69 UH and reference assemblies with the 
SNV export functionality within Mauve (70). The full list of SNVs was 
then filtered by requiring that at least 67 genomes had a sequence at the 
variable position. These 162,180 candidate core SNVs were subsequently 
examined for evidence of recombination by plotting SNV density across 
1-kb bins using the finished genome of the GC2 A. baumannii ACICU 
(NC_010611.1) as a reference, where regions with elevated SNV density 
(pairwise, >10 SNVs/kb) were then excluded from core phylogeny con- 
struction. This filtering process yielded 129,899 presumed nonrecombi- 
nant SNVs, which were used to generate a core phylogeny (see Table S4 in 
the supplemental material). A maximum-likelihood tree was constructed 
by using RAxML (71) with 100 bootstrap replicates. 
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Pangenome characterization: PanOCT and the gene content tree. 

Genes were clustered into ortholog sets with the pangenome analysis pro- 
gram PanOCT (72), which considers the gene neighborhood when iden- 
tifying orthologs. A minimum identity of 70% was required for ORFs to 
be placed in the same cluster. A genome-wide measure of shared gene 
content was calculated by computing a distance matrix based on the num- 
ber of genes shared by each pair of genomes. This distance matrix was used 
to generate a phylogenetic tree with FASTME (73). 

ISAbal analysis. Insertion sites for the ISAbal element were inferred 
from draft genome assemblies by identifying fragments of the element at 
contig edges and mapping the adjacent flanking sequence to the finished 
reference genome most closely related to the ST2 UH strains, A. bauman- 
nii TYTH-1 (NC 018706.1) (41). 

Nucleotide sequence accession numbers. The sequences obtained in 
this study were deposited in the GenBank database under accession num- 
bers AYGS00000000 to AYEWOOOO0O0O for the Illumina Velvet assem- 
blies and AYOH00000000 (UH9907) and AYGT00000000 (UH10707) for 
the PacBio HGAP assemblies. 
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